Evaluating Gene Set Enrichment Analysis Via a Hybrid Data Model
نویسندگان
چکیده
Gene set enrichment analysis (GSA) methods have been widely adopted by biological labs to analyze data and generate hypotheses for validation. Most of the existing comparison studies focus on whether the existing GSA methods can produce accurate P-values; however, practitioners are often more concerned with the correct gene-set ranking generated by the methods. The ranking performance is closely related to two critical goals associated with GSA methods: the ability to reveal biological themes and ensuring reproducibility, especially for small-sample studies. We have conducted a comprehensive simulation study focusing on the ranking performance of seven representative GSA methods. We overcome the limitation on the availability of real data sets by creating hybrid data models from existing large data sets. To build the data model, we pick a master gene from the data set to form the ground truth and artificially generate the phenotype labels. Multiple hybrid data models can be constructed from one data set and multiple data sets of smaller sizes can be generated by resampling the original data set. This approach enables us to generate a large batch of data sets to check the ranking performance of GSA methods. Our simulation study reveals that for the proposed data model, the Q2 type GSA methods have in general better performance than other GSA methods and the global test has the most robust results. The properties of a data set play a critical role in the performance. For the data sets with highly connected genes, all GSA methods suffer significantly in performance.
منابع مشابه
Presenting a Hybrid Approach based on Two-stage Data Envelopment Analysis to Evaluating Organization Productivity
Measuring the performance of a production system has been an important task in management for purposes of control, planning, etc. Lord Kelvin said :“When you can measure what you are speaking about, and express it in numbers, you know something about it; but when you cannot measure it, when you cannot express it in numbers, your knowledge is of a meager and unsatisfactory kind.” Hence, manag...
متن کاملA Model for Project Selecting with Limited Resources in Data Envelopment Analysis with Input and Output Fuzzy
In Evaluating Performance, Selecting a Subset from a Set of Solutions with Limited Resources is Essential. If There Is More Than One Input and Output, the Data Rnvelopment Analysis Optimization Models Are Evaluated and Performance Measurement Based on the Weighted Output Is Divided Weighted Input. In This Research, Two Models of Optimization with Limited Resources Present from Data Envelopment ...
متن کاملEvaluating the Effect of Various Parameters of Protective Spur Dike on Scour Depth Reduction using Group Method of Data Handling (GMDH) and Gene Expression Programming (GEP)
Spur dikes are one of the common methods to protect rivers against erosion. Scouring around the spur dike is an important factor that can disorder the structural performance. Using protective spur dike is proper technique reduce the scour amount. In this research, the GMDH and GEP model used in order to evaluate and estimate the effect of various parameters of protective spur dike o...
متن کاملInterpreting Gene Expression Data by Searching for Enriched Gene Sets
This paper presents a novel method integrating gene-gene interaction information and Gene Ontology for the construction of new gene sets that are potentially enriched. Enrichment of a gene set is determined by Gene Set Enrichment Analysis, which is a microarray data analysis method that uses ranks of the genes, according to their differentially expression values, to identify significant biologi...
متن کاملEvaluation and ranking of suppliers with fuzzy DEA and PROMETHEE approach
Supplier selection is a multi-Criteria problem. This study proposes a hybrid model for supporting the suppliers’ selection and ranking. This research is a two-stage model designed to fully rank the suppliers where each supplier has multiple Inputs and Outputs. First, the supplier evaluation problem is formulated by Data Envelopment Analysis (DEA), since the regarded decision deals with uncertai...
متن کامل